Skip to content

Redesign python build dependencies - For discussion#198

Draft
couteau wants to merge 14 commits into
open-vcpkg:mainfrom
couteau:build_deps
Draft

Redesign python build dependencies - For discussion#198
couteau wants to merge 14 commits into
open-vcpkg:mainfrom
couteau:build_deps

Conversation

@couteau
Copy link
Copy Markdown
Contributor

@couteau couteau commented Apr 6, 2026

I am starting this Pull Request to begin a discussion about a potential redesign of the build process for python modules in the python-registry repo. Now that QGIS4 is out, I've noticed that the python packages bundled with the app (at least on Mac) include not only the important data science package QGIS users depend on but also all the build dependencies for those packages. But we don't really need those build dependences bundled and distributed with QGIS. E.g., QGIS probably doesn't need wheel, gpep517, or setuptools bundled with the application, but they are being included in the bundle because at build time, they were necessary to build other python packages.

In an attempt to build a version of QGIS that leaves out these packages, I tried building with different host and target triplets (same OS and same architecture, just a different triplet name to force installation of packages into a different build directory). My thought was that the build deps would be installed in the host build directory, but only the packages in the target build directory would be bundled into the final app. However, it appears that the design of the build scripts in vcpkg-python-scripts assumes that the host and target triplets are the same.

The first obstacle I ran into is that the packages in this repo are inconsistent with whether build dependencies are marked for host installation. So I went through all the packages and made sure dependencies like setuptools, py-meson, etc. were all marked as host dependencies. It also seems like vcpkg-python-scripts should be included as a host dependency, too, though it only depends on python3 so it probably doesn't affect the final QGIS bundle. But for good measure, I also went through and marked vcpkg-python-scripts as a host dependency. But the next problem I ran into is that the various build scripts included in vcpkg-python-scripts are hardcoded to run in the VCPKG_INSTALLED_DIR environment, but after my changes, the build dependencies are installed in the VCPKG_HOST_INSTALLED_DIR environment. So the scripts can't find them and the build fails.

Fixing this would therefore require changes to the core python build and install scripts, and there are several ways to do it. Rather than pick one and implement it, I thought it might be more productive to raise the issue for discussion and see if there is interest in this change. Here are the possibilities I see as the most likely:

Easiest would be to run the scripts in the VCPKG_HOST_INSTALLED_DIR environment and implicitly require all build dependencies to be installed there. That should solve the problem but would continue having the problem of inflexibility that the current method has.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

And the third idea I had would be to use vcpkg's x_vcpkg_get_python_packages utility to create a virtual environment for building python packages rather than using the application's build environment. The virtual environment could be created with the most common and/or required build components such as gpep517, setuptools, wheel, flit-core, etc. and others could be available as features -- e.g. meson, hatchling. And if a package has unusual build requirements, the path to the build environment would be available in a CMAKE variable, and could be used to install the additional dependences using pip. Or we could create a utility function to do that more easily. Since the build deps would all be installed in a virtual environment, they wouldn't muck up the application's python environment which will then have only the packages actually needed at run time.

Related to these issues, I also noticed that py-gpep517 is currently being included inconsistently. It seems that most of the packages in the repo don't depend on it directly, but some of the other build packages (e.g. py-packaging) do depend on it, even though it isn't actually a build requirement for them. Notably vcpkg-python-scripts doesn't depend on py-gpep517 even though it uses the gpep517 module to build and install python packages. The result is that packages that use the scripts but don't depend directly or indirectly on py-gpep517 will run into build-time errors. I think the solution is probably for vcpkg-python-scripts to depend on py-gpep517 since it does in fact need it. However, if we were to redesign the python build scripts to use a virtual environment, that would obviate this issue. But for now, this PR removes the py-gpep517 dependency for packages that don't require it and adds a dependency on py-gpep517 to the vcpkg-python-scripts port.

@m-kuhn I'm eager to hear what you think.

@couteau couteau changed the title Redesign python build dependencies - NOT READY TO MERGE Redesign python build dependencies - For discussion - NOT READY TO MERGE Apr 6, 2026
@couteau couteau marked this pull request as draft April 6, 2026 16:39
@couteau couteau changed the title Redesign python build dependencies - For discussion - NOT READY TO MERGE Redesign python build dependencies - For discussion Apr 6, 2026
@m-kuhn
Copy link
Copy Markdown
Contributor

m-kuhn commented Apr 8, 2026

Very good points here, indeed the separation between host and target is not consistent at the moment, a few dependencies can clearly be moved to host dependencies.

For QGIS we have always used the same host and target triplets as this reduces build times considerably (e.g. qt will be built twice otherwise), this is not ideal but a pragmatic choice for faster development roundtrips and limited CI resources.

Easiest would be to run the scripts in the VCPKG_HOST_INSTALLED_DIR environment and implicitly require all build dependencies to be installed there. That should solve the problem but would continue having the problem of inflexibility that the current method has.

This would be the simplest path forward and my preference if we don`t want to open up a big change.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

And the third idea I had would be to use vcpkg's x_vcpkg_get_python_packages utility to create a virtual environment for building python packages rather than using the application's build environment. The virtual environment could be created with the most common and/or required build components such as gpep517, setuptools, wheel, flit-core, etc. and others could be available as features -- e.g. meson, hatchling. And if a package has unusual build requirements, the path to the build environment would be available in a CMAKE variable, and could be used to install the additional dependences using pip. Or we could create a utility function to do that more easily. Since the build deps would all be installed in a virtual environment, they wouldn't muck up the application's python environment which will then have only the packages actually needed at run time.

This is something that I find very interesting but never pursued. This would also have the advantage of always using one python version (system python) during the build process (I am pretty sure at the moment we still fall back to using system python in some code paths while we use vcpkg python for most others). The main uncertainty I had here is if this will create a new venv for every package we build. IIRC it does and the package cache doesn`t work in this scenario which adds some network traffic, wait times and an increased potential for instability due to network errors. Another thing to look at would be version pinning and SHA checks for the downloaded packages.

As for the specific proposal in this PR (moving py-gpep517 to a vcpkg-python-scripts dependency, this looks good to me but as you mentioned only makes sense if we dont generally switch to x_vcpkg_get_python_packages`.

@couteau
Copy link
Copy Markdown
Contributor Author

couteau commented Apr 13, 2026

@m-kuhn I just pushed a few commits that start to sketch out what I had in mind with the third option of using a virtual environment for building python packages. It is still a work in progress, but it works to build most packages on MacOS. I haven't tested Windows yet. It currently uses the vckpkg-installed python, not the system python. Using the system python could be made an option, but I worry that if packages are built against a different version of the python binaries than the one bundled with QGIS, it could create problems.

The main uncertainty I had here is if this will create a new venv for every package we build.

The approach here creates a single virtual environment in the CURRENT_HOST_INSTALLED_DIR/tools directory and uses that to build all the packages. No separate environment for each package

Another thing to look at would be version pinning and SHA checks for the downloaded packages.

Yes, this could become an issue if packages being built need different or conflicting versions of the same build tool. The current approach doesn't account for that. For the tools installed through features of the vcpkg-python-scripts, I currently just install the latest version available from PyPi, which so far has not caused any issues, though I don't yet have every package building successfully.

I see what you mean about PyQt needing to build qtbase for both host and target if they differ -- I had the same issue with py-qscintilla needing qscintilla in both host and target environments. I so far have not run into that for other packages. There are a few python build requirements that are needed at both build-time and runtime (e.g., numpy, pybind11, and PyQt6-sip), but using a virtual env allows those build requirements to be satisfied using pip so they don't take a lot of time or consume a lot of cpu resources (though they do, of course, require some network bandwidth to install). And you are correct that the binary cache doesn't help with this issue -- vcpkg seems to cache packages built against different triplets separately, even if they are identical. But it only builds Qt/qscintilla twice (once for the host environment, once for the target environment), not once for every package that depends on them.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

Right, it shouldn't be necessary. But there are currently a number of ports that don't make vcpkg-python-scripts a host dependency. That's probably an error, and if so, then we can fix those, and the scripts can then safely assume they are always installed in the host environment.

@m-kuhn
Copy link
Copy Markdown
Contributor

m-kuhn commented Apr 15, 2026

@m-kuhn I just pushed a few commits that start to sketch out what I had in mind with the third option of using a virtual environment for building python packages. It is still a work in progress, but it works to build most packages on MacOS. I haven't tested Windows yet. It currently uses the vckpkg-installed python, not the system python. Using the system python could be made an option, but I worry that if packages are built against a different version of the python binaries than the one bundled with QGIS, it could create problems.

I think this should not be a problem, in general build tools and runtime tools tend to be fairly well separated. But using the vcpkg installed python version certainly adds to standardizing tools.

Another thing to look at would be version pinning and SHA checks for the downloaded packages.

Yes, this could become an issue if packages being built need different or conflicting versions of the same build tool. The current approach doesn't account for that. For the tools installed through features of the vcpkg-python-scripts, I currently just install the latest version available from PyPi, which so far has not caused any issues, though I don't yet have every package building successfully.

It's also about a proper audit trail (e.g. supply chain attacks). At the moment we know exactly what has been used to produce a specific build. If we just use "latest", that will no longer be the case. We could possibly use a requirements.txt with --require-hashes, but I don't have any experience with that.

I see what you mean about PyQt needing to build qtbase for both host and target if they differ -- I had the same issue with py-qscintilla needing qscintilla in both host and target environments. I so far have not run into that for other packages. There are a few python build requirements that are needed at both build-time and runtime (e.g., numpy, pybind11, and PyQt6-sip), but using a virtual env allows those build requirements to be satisfied using pip so they don't take a lot of time or consume a lot of cpu resources (though they do, of course, require some network bandwidth to install). And you are correct that the binary cache doesn't help with this issue -- vcpkg seems to cache packages built against different triplets separately, even if they are identical. But it only builds Qt/qscintilla twice (once for the host environment, once for the target environment), not once for every package that depends on them.

Flagging things properly as host dependencies is a good move in any case, even if we don't make use of it in some build environments.
IIRC qtdeclarative was also a host dependency for some tools and takes a rather long time to build.

The second option would be to detect whether vcpkg-python-scripts has been installed in the host or target environment and set the relevant variables to use the appropriate environment when executing the build scripts. This should be doable and would still be adaptable.

I think this shouldn´t be needed, the vcpkg-python-scripts dependency should always be installed in the host environment (which will be equal to the target environment if both triplets match). Or do you see a reason why this would be required?

Right, it shouldn't be necessary. But there are currently a number of ports that don't make vcpkg-python-scripts a host dependency. That's probably an error, and if so, then we can fix those, and the scripts can then safely assume they are always installed in the host environment.

Perfect.

@couteau
Copy link
Copy Markdown
Contributor Author

couteau commented Apr 19, 2026

It's also about a proper audit trail (e.g. supply chain attacks). At the moment we know exactly what has been used to produce a specific build. If we just use "latest", that will no longer be the case. We could possibly use a requirements.txt with --require-hashes, but I don't have any experience with that.

@m-kuhn I added version requirements/hashes to the python build tools, so this issue should now be addressed. I also discovered that the qt libraries/packages from the main vcpkg repo do not build with different host and target triplets for the same reasons as the PyQt packages--the various qt components seem to need need qtbase installed (with various features enabled and and in some cases other qt components such as qtdeclarative) in order to build, and the ports in the main repo do not install those in the host environment. I think this is mainly a header file/linking issue and the build tools could probably be pointed to the target environment to find them so as to avoid needing to build them twice. But that would require changes to the upstream vcpkg repo. It could also potentially be worked around by adding all those requirements as host requirements in the QGIS vcpkg manifest, but that would definitely require building everything twice whenever the host and target triplets differ.

But that said, even if QGIS is always built with identical host and target triplets, there is benefit to using a virtual python environment for building python packages. It should have at least some of the benefit of avoiding packaging the build requirements with the built QGIS app. And by installing the build tools using pip rather than building them ourselves, we speed up the overall build process.

@m-kuhn
Copy link
Copy Markdown
Contributor

m-kuhn commented Apr 20, 2026

It's also about a proper audit trail (e.g. supply chain attacks). At the moment we know exactly what has been used to produce a specific build. If we just use "latest", that will no longer be the case. We could possibly use a requirements.txt with --require-hashes, but I don't have any experience with that.

@m-kuhn I added version requirements/hashes to the python build tools, so this issue should now be addressed.

Interesting, what is the list of hashes for (I was expecting a single hash per package).
How will build requirements and their hashes be updated? I see that the same build dep (with the same hashes) is available in multiple ports.

For reference, vcpkg upstream keeps a list of package sources for msys2 packages in a central location, this might be something to follow, not sure about the best way to handle this yet: https://github.com/Microsoft/vcpkg/blob/master/scripts/cmake/vcpkg_acquire_msys.cmake#L267 .

I also discovered that the qt libraries/packages from the main vcpkg repo do not build with different host and target triplets for the same reasons as the PyQt packages--the various qt components seem to need need qtbase installed (with various features enabled and and in some cases other qt components such as qtdeclarative) in order to build, and the ports in the main repo do not install those in the host environment.

For QField we do cross-builds with Qt for android and ios, where the host system is linux / macos. There is no python involved (plain upstream vcpkg registry).
You can see an example build here: https://github.com/opengisch/QField/actions/runs/24661274765/job/72107615406#step:10:125
There are plenty of host (arm64-osx) and target (arm64-ios) ports built. Host ports build and install fine and are available as tools for the target builds).

I think this is mainly a header file/linking issue and the build tools could probably be pointed to the target environment to find them so as to avoid needing to build them twice. But that would require changes to the upstream vcpkg repo. It could also potentially be worked around by adding all those requirements as host requirements in the QGIS vcpkg manifest, but that would definitely require building everything twice whenever the host and target triplets differ.

I cannot fully follow this, as mentioned before regarding cross-builds, I believe things work as expected.

But that said, even if QGIS is always built with identical host and target triplets, there is benefit to using a virtual python environment for building python packages. It should have at least some of the benefit of avoiding packaging the build requirements with the built QGIS app. And by installing the build tools using pip rather than building them ourselves, we speed up the overall build process.

🤞

@couteau
Copy link
Copy Markdown
Contributor Author

couteau commented Apr 20, 2026

@m-kuhn I added version requirements/hashes to the python build tools, so this issue should now be addressed.

Interesting, what is the list of hashes for (I was expecting a single hash per package). How will build requirements and their hashes be updated?

Pip accounts for the fact that packages are distributed for multiple systems in both binary and source form (e.g., if you look at the downloads link for numpy, there are 71 different wheel files for various OS versions and python versions plus a source tarball, each with a different hash, so there are 72 hashes for numpy in the hash file. There is a tool (pip-compile) that will calculate all the required hashes for a specific version of a package, which is what I used to create the hash file and the hashes for the one-off build deps.

I see that the same build dep (with the same hashes) is available in multiple ports.

Yes, for the non-standard build deps like sip-builder (i.e., the ones that are not included as features in the vcpkg-python-scripts port), there is some duplication. We could create a central repository for all the hashes, but eventually, there will be some new port with some new build requirement that we have not included. In the case of py-pyqt6 and py-qscintilla, we could probably avoid the duplication, because py-pyqt6 is a dependency of py-qscintilla, which can therefore safely assume its sip related build dependencies are already installed.

For reference, vcpkg upstream keeps a list of package sources for msys2 packages in a central location, this might be something to follow, not sure about the best way to handle this yet: https://github.com/Microsoft/vcpkg/blob/master/scripts/cmake/vcpkg_acquire_msys.cmake#L267

I will take a look at how that is done and see if there is anything there we can emulate.

I also discovered that the qt libraries/packages from the main vcpkg repo do not build with different host and target triplets for the same reasons as the PyQt packages--the various qt components seem to need need qtbase installed (with various features enabled and and in some cases other qt components such as qtdeclarative) in order to build, and the ports in the main repo do not install those in the host environment.

For QField we do cross-builds with Qt for android and ios, where the host system is linux / macos. There is no python involved (plain upstream vcpkg registry). You can see an example build here: https://github.com/opengisch/QField/actions/runs/24661274765/job/72107615406#step:10:125 There are plenty of host (arm64-osx) and target (arm64-ios) ports built. Host ports build and install fine and are available as tools for the target builds).

I'll also look at that and see what I can learn.

@couteau
Copy link
Copy Markdown
Contributor Author

couteau commented May 3, 2026

@m-kuhn - I'm still trouble-shooting some windows build issues, but I looked at some of the other issues.

First, on keeping hashes in a central location: the hashes for the common build tools used across the python eco-system are all in vcpkg-python-scripts/packages.cmake -- that should be fairly easy to update. A few less common tools, like the sip-related tools, are defined in the packages that used them. Part of the design of this new build system is that packages with special build requirements can still use the basic build environment and add their own build requirements. PyQt's use of sip is one example. If we thought sip was a common enough build requirement (perhaps reasonable given that, at the moment, it seems like this repo is support QGIS and nothing else), we could add a sip feature to the vcpkg-python-scripts port and move the sip-related packages/hashes into the central packages.cmake file.

I looked at the way the msys deps are handled, and it is similar. It uses a function to define the various msys packages and their versions and hashes, and then has a central cmake file that calls the function for all the packages that are supported. I define cmake cache variables for each package with the version and a list of hashes all in a central cmake file. I think both approaches work and I don't see an advantage to one over the other if the main point is to make it easy to update everything in a central location -- both approaches accomplish that.

Second, I also looked at how QField handles cross compiling with Qt. Essentially, it pulls in a lot of Qt as both host and target dependencies. In a cross-compiling scenario, that means those packages are being build twice. We can do that for QGIS, too, if we are ok with building those packages twice. It shouldn't slow down the ci builds, because they don't use different host and target triplets, so everything will just get built once.

I started on this, but the dependency tree was quite deep, so I gave up. But I think QField might offer a roadmap to make sure everything needed for building is installed in the host environment. I will go back to it when I have a chance.

In the meantime, once I get the windows build issues resolved, this new build system should work fine in a non-cross-compiled scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants